## Jul 14, 2022 | RISC-V Perf Analysis SIG Meeting

Attendees: Beeman Strong LIU Zhiwei tech.meetings@riscv.org

## **Notes**

## • Attendees:

- Marc Casas
- John Simpson
- o Bruce Ableidinger
- Zhiwei
- Jeff Scheel
- Mark Himelstein
- Jessica Clark
- o Beeman
- Anup Patel
- Aaron Durbin
- Atish
- Slides, audio, and video here
- Mark: where is Valgrind?
  - Beeman: listed as a gap but WIP from PLCT
  - Mark: check with Wei Wu at PLCT to see if he can find a representative for this group
- Mark: soon we'll have a formal way to make enabling status accessible to all
  - Want to have a story for where we are for Linux Plumbers and other upcoming conferences
- Call for chairs window closed, expect interviews next week
- No other feedback on IOMMU PMU, or other opens
- Marc: from BSC, involved in HPC perf analysis and tools
- Switched to Marc's slides
- Demoing example of Paraver using trace collection from performance counters (from FPGA)
  - Uses Extrae & PAPI to collect traces
- Viewing code lines over time, can see iterations of execution phases
  - Added functions over time, synchronized with code lines view
  - Combined the two, with coloring of code line data points by function
  - Can copy/paste the same scale for each window
- Added sampling of performance counters, events multiplexed over time
  - Added IPC and L1 MPKI over time
  - Added VPU utilization, can see areas where it isn't used. Follow code line to see the line in code, missing pragma to vectorize.
  - Added VPU mem insts vs VPU arith insts. Can craft metric views, showed ratio
    of mem vs arith ("derived timeline")
  - o Add FPU inst counts

- Demo summarized in slides
- Jessica: do you have plans to adapt tooling to work with standard RISC-V architectures?
  - Marc: Extrae can be installed on any standard RISC-V part today
- Bruce: how often are samples being taken?
  - Marc: uniform distribution at 30ms (avg) with 30ms variance. Has overhead, can see Extrae sample collections in code lines plots.
- Bruce: why not implement more counters in your FPGA, to avoid multiplexing?
  - Marc: just saving area cost. Lots of discussions about this. Have many events but not enough HW counters to count them at once.
- Extrae file specifies counter sets that you want, sample rate, etc
- PAPI port is ongoing, just a partial solution right now. Not sure if it will be upstreamed but think so.

| Action | 1101110 |
|--------|---------|

| Beeman Strong | - check with | Wei Wu at | t PLCT on | Valgrind, | and about | representat | tion in |
|---------------|--------------|-----------|-----------|-----------|-----------|-------------|---------|
| the group     |              |           |           |           |           |             |         |